feat: Libraries v2 backup endpoint#37419
Conversation
|
Thanks for the pull request, @rodmgwgu! This repository is currently maintained by Once you've gone through the following steps feel free to tag them in a comment and let them know that your changes are ready for engineering review. 🔘 Get product approvalIf you haven't already, check this list to see if your contribution needs to go through the product review process.
🔘 Provide contextTo help your reviewers and other members of the community understand the purpose and larger context of your changes, feel free to add as much of the following information to the PR description as you can:
🔘 Get a green buildIf one or more checks are failing, continue working on your changes until this is no longer the case and your build turns green. DetailsWhere can I find more information?If you'd like to get more details on all aspects of the review process for open source pull requests (OSPRs), check out the following resources: When can I expect my changes to be merged?Our goal is to get community contributions seen and reviewed as efficiently as possible. However, the amount of time that it takes to review and merge a PR can vary significantly based on factors such as:
💡 As a result it may take up to several weeks or months to complete a review and merge your PR. |
ab2af32 to
b9d8748
Compare
|
Could you please include the github issue in your description? Something like "Resolves: openedx/openedx-core#387" at the top. |
a9e1d2e to
2333d13
Compare
ormsbee
left a comment
There was a problem hiding this comment.
A few relatively minor questions and requests.
I'm a little uncomfortable with the amount of mocking necessary in the tests, and I do wish we could do a bit more in the way of end-to-end testing (e.g. creating a small library and exporting it). But that's probably not the best use of time at the moment.
Thank you!
| - Failed: Task failed and the export did not complete. | ||
| """ | ||
| ensure_cms("backup_library may only be executed in a CMS context") | ||
| set_code_owner_attribute_from_module(__name__) |
There was a problem hiding this comment.
This line shouldn't be necessary.
There was a problem hiding this comment.
I see that most of the tasks on this module have the @set_code_owner_attribute decorator, which is not compatible with the @shared_task decorator when using UserTask, that's why I'm manually setting it here, as it's done also on the existing course export/import tasks.
Although I'm not sure I understand 100% why it's needed there, are you sure this is not needed here?
There was a problem hiding this comment.
Setting the code owner is something we did at edX so that if the APM error monitoring started throwing a bunch of errors on the endpoint, the person on call would know who to contact to help triage the issue. It also makes it easier for teams to build dashboards around it. I guess there's no harm in keeping it, but I don't think anyone will be looking at the data that it generates.
There was a problem hiding this comment.
Thanks. This decorator is no longer needed. See:
There was a problem hiding this comment.
Thanks!, I'll remove it
There was a problem hiding this comment.
After removing set_code_owner_attribute_from_module, the "Semgrep code quality" check is now failing with:
test_root.semgrep.celery-missing-code-owner-function
Celery tasks need to be decorated with
@set_code_owner_attribute (from the
edx_django_utils.monitoring module) in order for us to
correctly track code-owners for errors and in other
monitoring.
For more information, see the Celery section of "Add
Code_Owner Custom Attributes to an IDA" in the Monitoring
How-Tos of <https://docs.openedx.org/projects/edx-django-
utils/en/latest/>.
It seems there is a test specific for this still up
There was a problem hiding this comment.
Apologies. You can restore it, or remove the test. It’s a DEPR that requires attention at some point.
There was a problem hiding this comment.
thanks, restored it for now.
|
|
||
| with open(file_path, 'rb') as zipfile: | ||
| artifact = UserTaskArtifact(status=self.status, name='Output') | ||
| artifact.file.save(name=os.path.basename(zipfile.name), content=File(zipfile)) |
There was a problem hiding this comment.
Will this generated file have a difficult-to-guess URL (e.g. a randomized value for the directory prefix to the file)? My understanding is that UserTaskArtifacts are not secured via auth by default, and that we have to rely on making difficult-to-brute-force URLs.
There was a problem hiding this comment.
For the default filesystem store, this is correct, it's not secured by default, however other stores may implement security themselves, for example, the S3 store does have this and generates a temporary signed url for the file.
In my tests, the files generates using the filesystem storage look like "/media/user_tasks/2025/10/03/lib-wgu-csprob-2025-10-03-153633.zip", the only thing that would make them kind of difficult to guess are the datetime in the file name.
I could add a random postfix to the filename, I'm not sure if I can control the directory as it's managed by the UserTasks library.
There was a problem hiding this comment.
If the S3 version has a temp signed url for the file, I think that's acceptable. This is one of the things we'll want to include in the release notes.
There was a problem hiding this comment.
Here is more info on how this behaves, from the django-user-tasks docs: https://django-user-tasks.readthedocs.io/en/latest/authorization.html#artifact-url-access
a164a55 to
fd78de7
Compare
| # Get the latest task for this user and library | ||
| task_status = UserTaskStatus.objects.filter( | ||
| user_id=user_id, | ||
| name__contains=str(library_key) |
There was a problem hiding this comment.
So at a low level, I think we'd want to filter by task_class (so there's no mixing of backup and restore tasks), and put a bound on created (automated processes may make many, many tasks over time).
But at a higher level, is this part of the API here to service the use case where someone wanders away from the backup page and then comes back to it? Because if so, I'd rather we just not handle that case yet, and require the user to stay on the page. It shouldn't take that long to create the backup, and it's not clear to me how long a backup should remain "good", or what exactly the logic for invalidating a backup download should be, if we allow people to access historical backups.
There was a problem hiding this comment.
ok, I removed that use case, now we always need to specify the task_id. thanks!
d40ac9e to
dc941af
Compare
ormsbee
left a comment
There was a problem hiding this comment.
Very small request, and then I think we're good to squash and merge. Thank you!
| task_status = None | ||
| if task_id: | ||
| # Get the specific task by ID | ||
| try: | ||
| task_status = UserTaskStatus.objects.get(task_id=task_id, user_id=user_id) | ||
| except UserTaskStatus.DoesNotExist: | ||
| return None | ||
|
|
||
| if not task_status: | ||
| return None |
There was a problem hiding this comment.
Since task_id is now required, we don't need all this any longer. We can do something more like:
| task_status = None | |
| if task_id: | |
| # Get the specific task by ID | |
| try: | |
| task_status = UserTaskStatus.objects.get(task_id=task_id, user_id=user_id) | |
| except UserTaskStatus.DoesNotExist: | |
| return None | |
| if not task_status: | |
| return None | |
| try: | |
| task_status = UserTaskStatus.objects.get(task_id=task_id, user_id=user_id) | |
| except UserTaskStatus.DoesNotExist: | |
| return None |
dc941af to
d56a1b0
Compare
|
|
||
| if task_status.state == UserTaskStatus.SUCCEEDED: | ||
| artifact = UserTaskArtifact.objects.get(status=task_status, name='Output') | ||
| result['url'] = artifact.file.storage.url(artifact.file.name) |
There was a problem hiding this comment.
@rodmgwgu CC @holaontiveros Do you think we can update this to return an absolute URL, before Ulmo? REST APIs should generally only return absolute URLs, doubly so if the field is called "url", and triply so if we're talking about file storage which may be on a totally separate server/domain like S3. But this API is currently returning just the path part.
I noticed this here in the corresponding frontend PR.
There was a problem hiding this comment.
@bradenmacdonald I modeled this one around the existing course export code here, which is doing the same, it seems that if it's S3, it will give us an absolute url, but for filestorage it returns a relative path.
But what you mention makes sense, I'll review this to always return an absoulte URL.
There was a problem hiding this comment.
@bradenmacdonald here is the PR addressing this issue: #37508
Description
Adds endpoints for Library v2 backup functionality, related to the new library import/export functionality being implemented in openedx-learning by @dwong2708.
Use Case
Example Requests
POST Response Values
If the import task is started successfully, an HTTP 200 "OK" response is returned.
The HTTP 200 response has the following values:
Example POST Response
{ "task_id": "7069b95b-ccea-4214-b6db-e00f27065bf7" }GET Parameters
A GET request can include the following parameters:
GET Response Values
If the import task is found successfully by the UUID provided, an HTTP 200 "OK" response is returned.
The HTTP 200 response has the following values:
Example GET Response
{ "state": "Succeeded", "url": "/media/user_tasks/2025/10/03/lib-wgu-csprob-2025-10-03-153633.zip" }Supporting information
Resolves: openedx/openedx-core#387
Required by: openedx/frontend-app-authoring#2448
Testing instructions
Legacy Libraries")/api/libraries/v2/lib:WGU:CSPROB/backup/(change the library key with yours){"task_id":"747e5ab3-5aef-4aef-88a3-6a195c849e12"}/api/libraries/v2/lib:WGU:CSPROB/backup/?task_id=747e5ab3-5aef-4aef-88a3-6a195c849e12(change the library key and the task_id to yours)Deadline
Ulmo Release
Other information